However, the Pixtral image processor also uses the default pad function from the image_transforms.py file, which by default will pad with black pixels.
Doesn’t this mean you could end up with an image that has a white background with a black padded border? It doesn’t really seem right but I want to know if it is intentional.
For example, the image dimension conversion and padding functions used in Hugging Face Transformers, such as Qwen 2 VL, seem to black out the alpha channel. This is probably not intentional. Or rather, it probably doesn’t have much meaning. It’s likely that the developers are just concerned with getting consistent results.
(I think it’s probably left as is because in larger models, whether the background is black or white doesn’t seem to have a significant impact on the results…)
Regarding Pixtral, it seems to have been white-filled from the first commit, so I wonder if the model was trained with that in mind. Since there doesn’t seem to be any discussion about it, I think you’d have to ask the committer on GitHub for clarification.